Space-Efficient Support for Temporal Text Indexing in a Document Archive Context

نویسنده

  • Kjetil Nørvåg
چکیده

Support for temporal text-containment queries (query for all versions of documents that contained one or more particular words at a particular time t) is of interest in a number of contexts, including web archives, in a smaller scale temporal XML/web warehouses, and temporal document database systems in general. In the V2 temporal document database system we employ a combination of full-text indexes and variants of time indexes to perform efficient textcontainment queries. This approach was optimized for moderately large temporal document databases. However, for “extremely large databases” the index space usage of the approach could be too large. In this paper, we present a more spaceefficient solution to the problem, the architecture of the interval-based temporal text index (ITTX), we present appropriate algorithms for update and retrieval, and we discuss advantages and disadvantages of the two approaches.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Indexing Techniques for Temporal Text Containment Queries

Many information management systems maintain multiple time stamped versions of documents. The archives of web pages, version control systems, wikis and backup mechanisms are examples of such systems. For such temporally versioned document collections, a search using keywords along the temporal dimension is valuable. This paper studies the temporal dimension of keyword search in the context of t...

متن کامل

Improving Space-Efficiency in Temporal Text-Indexing

Support for temporal text-containment queries is of interest in a number of contexts. In previous papers we have presented two approaches to temporal text-indexing, the V2X and ITTX indexes. In this paper, we first present improvements to the previous techniques. We then perform a study of the space usage of the indexing approaches based on both analytical models and results from indexing tempo...

متن کامل

Image Retrieval: Content versus Context

In this paper, we introduce a new approach to image retrieval. This new approach takes the best from two worlds, combines image features (content) and words from collateral text (context) into one semantic space. Our approach uses Latent Semantic Indexing, a method that uses co-occurrence statistics to uncover hidden semantics. This paper shows how this method, that has proven successful in bot...

متن کامل

Privacy-Preserving Text Indexing for Search of Documents

Protection of content of sensitive text documents is important in enterprise intranets. An index structure is needed to support efficient search and retrieval, but it can lead to information leakage; by statistical attacks an adversary can draw probabilistic inference about the contents of document collection. Zerr and others present a confidential index structure and the ranking of retrieved d...

متن کامل

Fast Incremental Indexing for Full-Text Information Retrieval

Full-text information retrieval systems have traditionally been designed for archival environments. They often provide little or no support for adding new documents to an existing document collection, requiring instead that the entire collection be re-indexed. Modern applications, such as information filtering, operate in dynamic environments that require frequent additions to document collecti...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003